Input/output command rebalancing in a virtualized computer system

ABSTRACT

The present disclosure provides new methods and systems for input/output command rebalancing in virtualized computer systems. For example, an I/O command may be received by a rebalancer from a virtual queue in a container. The container may be in a first virtual machine. A second I/O command may be received from a second virtual queue in a second container which may be located in a second virtual machine. The rebalancer may detect a priority of the first I/O command and a priority of the second I/O command. The rebalancer may then assign an updated priority each I/O command based on a quantity of virtual queues in the virtual machine of origin and a quantity of I/O commands in the virtual queue of origin. The rebalancer may dispatch the I/O commands to a physical queue.

CROSS REFERENCES TO RELATED APPLICATIONS

This application is a continuation of and claims priority to and thebenefit of U.S. patent application Ser. No. 15/276,323, filed on Sep.26, 2016, the entire content of which is hereby incorporated byreference.

BACKGROUND

The present disclosure relates generally to the management ofinput/output (I/O) commands in virtualized computer system. A typicalsystem may include multiple virtual machines, and each virtual machinemay have one or more virtual queues. Each queue may include I/O commandsoriginating from a container or an application running in the virtualmachine. The I/O commands may have priorities respectively associatedwith each of the I/O commands. The priorities may control, in part, theorder the I/O commands are processed on the physical layer of thesystem.

SUMMARY

The present disclosure provides new and innovative methods and systemsfor managing I/O commands in a virtualized computer system. For example,in a method of queuing I/O commands, a rebalancer may receive a firstI/O command from a first virtual queue that is in a first container orapplication. The first container or application may be in a firstvirtual machine. The rebalancer may also receive a second I/O commandfrom a second virtual queue in a second container or application in asecond virtual machine. The rebalancer may detect a first priority ofthe first I/O command and a second priority of the second I/O command.The rebalancer may then assign a first updated priority to the first I/Ocommand and a second updated priority to the second I/O command. Thefirst updated priority may be assigned based on a first quantity ofvirtual queues in the first virtual machine and a first quantity of I/Ocommands in the first virtual queue. The second updated priority may beassigned based on a second quantity of virtual queues in the secondvirtual machine and a second quantity of I/O commands in the secondvirtual queue. The rebalancer may dispatch the first I/O command to afirst physical queue and the second I/O command to a second physicalqueue.

Additional features and advantages of the disclosed method and apparatusare described in, and will be apparent from, the following DetailedDescription and the Figures.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram of a virtualized computer system according toan example of the present disclosure.

FIG. 2 is a block diagram of a virtual queue rebalancing systemaccording to an example of the present disclosure.

FIG. 3 is a flowchart illustrating an example method for rebalancing I/Ocommands according to an example of the present disclosure.

FIGS. 4a and 4b are flow diagrams of I/O command rebalancing accordingto an example of the present disclosure.

FIG. 5 is a block diagram of a virtual queue rebalancing systemaccording to an example of the present disclosure.

DETAILED DESCRIPTION

At runtime, containers or applications may be assigned to resourcecontrol groups. Resource control groups may determine how I/O commandsfrom containers or applications are prioritized. Containers are oftendeployed on virtual machines, on both public and private cloud basednetworks, such as Amazon™ EC2™, Google™ Container Engine, Microsoft™Azure™ and OpenStack™ Magnum. The priorities of I/O commands invirtualized systems are not consistently honored and, for example, I/Ocommands issued from a virtual machine with more virtual queues may beprocessed at a less frequent interval than I/O commands issued from avirtual machine with fewer queues and fewer I/O commands in thosequeues. This may result in the failure of I/O balancing policies inplace. Such a problem may be created by the virtualization of thecomputer system, as the commands are transferred from a virtual layer toa physical layer. The transfer may be accomplished by software, known asa hypervisor, which serves as a layer between the virtual layer of thevirtualized system and the physical hardware layer upon which the systemruns. The hypervisor may assign I/O commands to different processors, byway of different physical queues, and schedule the execution of suchcommands on the physical layer with the help of a scheduler. Thescheduler may not be aware of the underlying application or container,and accordingly, may not be able to determine the container priority,application priority, length of I/O queues, and quantity of virtualizedqueues. As such, a scheduler cannot ensure I/O commands are processedbased on their true priorities. A weighted round robin may be used toprioritize commands, however, a weighted round robin may only considerthe command priority and is not able to consider the container priority,application priority, length of I/O queues, and quantity of virtualizedqueues. As such, a weighted round robin cannot ensure I/O commands areprocessed based on their true priorities.

The present disclosure addresses the above discussed problem using arebalancer to recalculate the weight of each I/O command and issue anew, updated priority to the I/O command, ensuring that I/O commands areprocessed according to their true priorities. To accomplish this, therebalancer may detect a quantity of virtual queues present in a virtualmachine from which an I/O command originates. The rebalancer may furtherdetect the quantity of I/O commands in each virtual queue sending I/Ocommands to the rebalancer and a container priority of a container fromwhich an I/O command originates. In the case of applications instead ofcontainers, the rebalancer may detect an application priority of anapplication from which an I/O command originates. The rebalancer mayfurther detect a command priority of the I/O command.

Rebalancer may detect a priority associated with each I/O command. Thecommand priority may be found in an I/O command header. Rebalancer mayalso detect an application priority of an application from which the I/Ocommand originates. A high application priority may be associated with areal-time application such as a video-streaming application, a databaseoperation application, or a disk maintenance application. A lowapplication priority may be associated with an application such as aword processing application, a data backup application, or a fileindexing application.

The rebalancer may then assign an updated priority to each I/O commandreceived by the rebalancer. The updated priority may be based on theinformation detected by the rebalancer. Accordingly, with the updatedpriorities, I/O commands may be processed according to their priority.

The rebalancer may be part of a storage controller in a Non-VolatileMemory Express (NVMe) system. In a virtualized system, the rebalancermay enforce some of the failed I/O balancing policies that fail as aresult of the virtualization of the system. The rebalancer mayaccomplish this by detecting a container priority, application priority,length of I/O queues, and quantity of virtualized queues and thenassigning an updated priority to an I/O command. Thus the rebalancerensures that I/O commands are processed based on their true priorities.Additionally, the rebalancer may be part of an integrated deviceelectronics (IDE) storage controller or a small computer systeminterface (SCSI) storage controller. The rebalancer may be part of thestorage controller or alternatively, the rebalancer may be a separatecomponent in the hardware layer and coordinate work with the storagecontroller.

FIG. 1 depicts a high level system diagram of an example virtualizationcomputer system 100. The virtualization computer system 100 may includeone or more interconnected nodes 110A-B. Each node 110A-B may in turninclude one or more physical processors (e.g., CPU 120A-C)communicatively coupled to memory devices (e.g., MD 130A-C) andinput/output devices (e.g., I/O 140A-B). Each node 110A-B may beconnected to a rebalancer 160 and a storage controller 165.

As used herein, physical processor or processor 120A-C refers to adevice capable of executing instructions encoding arithmetic, logical,and/or I/O operations or commands. In one illustrative example, aprocessor may follow Von Neumann architectural model and may include anarithmetic logic unit (ALU), a control unit, and a plurality ofregisters. In an example, a processor may be a single core processorwhich is typically capable of executing one instruction at a time (orprocess a single pipeline of instructions), or a multi-core processorwhich may simultaneously execute multiple instructions. In anotherexample, a processor may be implemented as a single integrated circuit,two or more integrated circuits, or may be a component of a multi-chipmodule (e.g., in which individual microprocessor dies are included in asingle integrated circuit package and hence share a single socket). Aprocessor may also be referred to as a central processing unit (CPU).

As discussed herein, a memory device 130A-E refers to a volatile ornon-volatile memory device, such as RAM, ROM, EEPROM, or any otherdevice capable of storing data. As discussed herein, I/O device 140A-Brefers to a device capable of providing an interface between one or moreprocessor pins and an external device, the operation of which is basedon the processor inputting and/or outputting binary data.

Processors 120A-C may be interconnected using a variety of techniques,ranging from a point-to-point processor interconnect, to a system areanetwork, such as an Ethernet-based network. Local connections withineach node 110A-B, including the connections between a processor 120A anda memory device 130A-B and between a processor 120A and a rebalancer 160and/or storage controller 165 may be provided by one or more local busesof suitable architecture, for example, peripheral component interconnect(PCI). As used herein, a device of the host OS 186, or host device, mayrefer to CPU 120A-C, MD 130A-C, rebalancer 160, storage controller 165,a software device, and/or a hardware device.

As noted above, computer system 100 may run one virtual machine 170A, ormultiple virtual machines 170A-B, by executing a software layer (e.g.,hypervisor 180) above the hardware and below the virtual machine 170A-B,as schematically shown in FIG. 1. In an example, the hypervisor 180 maybe a component of the host operating system 186 executed by the computersystem 100. In another example, the hypervisor 180 may be provided by anapplication running on the operating system 186, or may run directly onthe computer system 100 without an operating system beneath it. Thehypervisor 180 may virtualize the physical layer, including processors,memory, and I/O devices, and present this virtualization to a virtualmachine 170A-B as devices, including virtual processors 190A-B, virtualmemory devices 192A-B, and/or virtual I/O devices 194A-B.

In an example, a virtual machine 170 may execute a guest operatingsystem 196A which may utilize the underlying VCPU 190A, VIVID 192A, andVI/O 194A. One or more applications 198A-B may be running on a virtualmachine 170 under the respective guest operating system 196A. Processorvirtualization may be implemented by the hypervisor 180 scheduling timeslots on one or more physical processors 120A-C such that, from theguest operating system's perspective, those time slots are scheduled ona virtual processor 190A. Further, I/O command virtualization may beimplemented by the hypervisor 180 and may further be managed byrebalancer 160. Rebalancer 160 may manage a command queue for the system100. Rebalancer 160 may be a hardware component on the physical layer ofsystem 100. Rebalancer 160 may be a software component run on thephysical layer of the system 100.

A virtual machine 170 may run on any type of dependent, independent,compatible, and/or incompatible applications on the underlying hardwareand OS 186. In an example, application 198A may run on virtual machine170A and may be dependent on the underlying hardware and/or OS 186. Inanother example, applications 198B may run on virtual machine 170B andmay be independent of the underlying hardware and/or OS 186.Additionally, applications 198A-B run on a virtual machine 170A-B may becompatible with the underlying hardware and/or OS 186. Applications198A-B run on a virtual machine 170 may be incompatible with theunderlying hardware and/or OS. In an example, a device may beimplemented as a virtual machine 170. The hypervisor 180 manageshypervisor memory 184 for the host operating system 186 as well asmemory allocated to the virtual machine 170A-B and guest operatingsystem 196. Applications 198A-B may have virtual queues for storing I/Ocommands before the I/O commands are dispatched for processing.

Running on the virtual machine 170A-B may be one or more containers197A-B. Each container (e.g. container 197A) may have a respectivevirtual queue (e.g. 199A). The virtual queue 199A may store I/O commandsfrom the container 197A before the I/O commands are dispatched forprocessing. The virtual machine 170A-B may further include anorchestrator 195A-B. The orchestrator 195A may create container 197Awhen needed and remove container 197A when no longer needed. Container197A may share the kernel of virtual machine 170A. Also, container 197Amay contain an application or a component of an application.

FIG. 2 is a block diagram of a submission queue rebalancing systemaccording to an example of the present disclosure. The example system200 may include virtual machine 170A and virtual machine 170B. Virtualmachine 170A may include container 197A, container 210, and weightedround robin (WRR) 220A. Container 197A may include a queue 199A andcontainer 210 may include queue 211. Queue 199A may include I/O commandsfrom container 197A. Queue 199A may be a virtual submission queue in anNVMe system. Queue 211 may include I/O commands from container 210.Queue 211 may be a virtual submission queue in an NVMe system. Queue199A and queue 211 may dispatch or send I/O commands to WRR 220A. WRR220A may provide for the transmission of I/O commands to a rebalancer160. Virtual machine 170A includes containers 197A & 210, and in anexample, queue 199A may include more I/O commands than queue 211.However, in this example, queue 211 may include more higher prioritizedI/O commands than queue 197A. Accordingly, WRR 220A may arbitrate I/Ocommands based on a prioritization of the I/O commands, ensuring thathigher prioritized I/O commands are sent to rebalancer 160 morefrequently than lower prioritized I/O commands. Alternatively, WRR 220Amay employ first-in-first-out to send the first received I/O command torebalancer 160, followed by the second I/O command, followed by thethird I/O command, etc.

Virtual machine 170B may include container 197B and WRR 220B. Container197B may include queue 199B. Queue 199B may be a virtual submissionqueue. Queue 199B may include I/O commands from container 197B. Queue199B may send I/O commands to WRR 220B. WRR 220B may provide for thetransmission of I/O commands to a rebalancer 160. WRR 220B may arbitrateI/O commands based on a prioritization of the I/O commands, ensuringthat higher prioritized I/O commands are send to rebalancer 160 morefrequently than lower prioritized I/O commands. Alternatively, WRR 220Bmay employ first-in-first-out to send the first received I/O command torebalancer 160, followed by the second I/O command, followed by thethird I/O command, etc.

Virtual machine 170A-B may include one container 197B, two containers197A & 210 or many containers 197A-B & 210. Each container 197A-B & 210may include one queue 199A-B & 211 respectively. Containers 197A & 210may share a queue.

Rebalancer 160 may receive I/O commands from one or more WRR 220A-B.Rebalancer 160 may detect a priority of each received I/O command.Rebalancer 160 may detect a quantity of commands in each queue 199A-B &211. The priority of each received I/O command may depend on the type ofI/O command. The priority of each received I/O command may depend on apriority of the container 197A that the I/O command came from.Rebalancer 160 may assign a new priority to each I/O command based on aquantity of virtual queues 199A & 211 in the virtual machine 170A.Rebalancer 160 may assign a new priority to each I/O command based onthe detected priority of the I/O command.

Rebalancer 160 may send the I/O commands with new priorities to aphysical queue 240A-B. For example, rebalancer 160 may send the I/Ocommands with new priorities to a single physical queue 240A. In anotherexample, rebalancer 160 may send the I/O commands to more than onephysical queue 240A-B. The I/O commands with new priorities may be sentto a WRR 220C to be reordered for processing. Physical queue 240A-B andWRR 220C may be part of rebalancer 160. Physical queue 240A-B and WRR220C may be discrete components that rebalancer 160 has access to. WRR220C may receive I/O commands with new priorities from the physicalqueue 240A-B and may transmit the I/O commands to physical CPU 120A. WRR220C may arbitrate I/O commands based on a prioritization of the I/Ocommands, ensuring that higher prioritized I/O commands are send to thephysical CPU 120A more frequently than lower prioritized I/O commands.Alternatively, WRR 220C may employ first-in-first-out to send the firstreceived I/O command to the physical CPU 120A, followed by the secondI/O command, followed by the third I/O command, etc. The CPU 120A mayexecute the I/O commands.

FIG. 3 is a flow diagram illustrating an example method for rebalancingI/O commands according to an example of the present disclosure. Althoughthe example method 300 is described with reference to the flowchartillustrated in FIG. 3, it will be appreciated that many other methods ofperforming the acts associated with the method 300 may be used. Forexample, the order of some of the blocks may be changed, certain blocksmay be combined with other blocks, and some of the blocks described areoptional. In an example, the method 300 may be performed by processinglogic that may comprise hardware (circuitry, dedicated logic, etc.),software, or a combination of both.

The example method 300 starts with a rebalancer receiving an I/O commandfrom a virtual queue in a first container or application, where thefirst container or application is in a first virtual machine (block311). For example, rebalancer 160 may receive a standard priority readcommand from queue 199A, where queue 199A may be in a high prioritycontainer 197A. Container 197A may be in virtual machine 170A. Virtualqueue 199A may receive I/O commands from container 197A.

In another example, rebalancer 160 may receive a standard priority readcommand from a queue 199A, where queue 199A may be in a high priorityapplication 198A. Application 198A may be in virtual machine 170A andvirtual queue 199A may receive I/O commands from application 198A.

The rebalancer may receive a second I/O command from a second virtualqueue in a second container or application, where the second containeror application is in a second virtual machine (block 313). For example,rebalancer 160 may receive a high priority data wipe command from queue199B, where queue 199B may be in a standard priority container 197B.Container 197B may be in virtual machine 170B. Virtual queue 199B mayreceive I/O commands from container 197B.

In another example, rebalancer 160 may receive a high priority data wipecommand from a queue 199B, where queue 199B may be in a high priorityapplication 198B. Application 198B may be in virtual machine 170A andvirtual queue 199B may receive I/O commands from application 198B. I/Ocommands may have priorities associated with each command. For example,the first I/O command may have come from a container 197A where all I/Ocommands originating in that container 197A are given a higher priority.Container 197A may be a premium container where a user paid anadditional fee for increased priority and all I/O commands from thatcontainer 197A may have a higher container priority as a result. Userstatus may be included in the container priority. A user with higherstatus may receive a higher container priority based on the status. Userstatus may be based on a role the user has. Application status, thestatus an application has in a system may cause a container 197A to havea higher container priority. Further, operation status, the status of anoperation, may cause a container 197A to have a higher containerpriority. A container 197A may assign a container priority and a commandpriority to an I/O command. Similarly, application 198A may assign anapplication priority to an I/O command. A container 197A may assign anapplication priority to an I/O command if an application is running onthe container 197A. Other command types may warrant an increasedpriority based on the operation associated with the I/O command. Acontainer may be a standard usage container and all I/O commandsoriginating in that container may be given a standard priority.

Further, commands originating from certain applications may havedifferent priorities associated with the command, an applicationpriority. For example, an I/O command from a database application mayhave a higher application priority than an I/O command originating froma word processing application.

The rebalancer may detect priorities of the first I/O command andpriorities of the second I/O command (block 315). For example,rebalancer 160 may detect that the first I/O command is a standardpriority read command and that the second I/O command is a high prioritydata wipe command. The I/O command may include a command field in theheader of the I/O command. Rebalancer 160 may read the command field ofthe I/O command to detect the priority of the I/O command. The commandfield may include information about the container 197A from which thecommand originated and information about the type of command the I/Ocommand is.

The rebalancer may assign an updated priority to the first I/O commandbased on a quantity of virtual queues in the first virtual machine and aquantity of I/O commands in the first virtual queue and further assignan updated priority to the second I/O command based on a quantity ofvirtual queues in the second virtual machine and a quantity of I/Ocommands in the second virtual queue (block 317). For example,rebalancer 160 may assign an updated priority to the standard priorityread command based on a quantity of virtual queues (e.g., two queues199A & 211 in virtual machine 170A of FIG. 2) in virtual machine 170Aand a quantity of I/O commands in the virtual queue 199A. Virtual queue199A may have fifty I/O commands queued waiting for transmission.Rebalancer 160 may further assign an updated priority to the highpriority data wipe command based on a quantity of virtual queues (e.g.,one queue 199B in virtual machine 170B of FIG. 2) in virtual machine170B and a quantity of I/O commands in the virtual queue 199B. Virtualqueue 199B may have one I/O command queued waiting for transmission. Theupdated priority may be based on the original priority of the I/Ocommand multiplied by the quantity of I/O commands in the virtual queue.Rebalancer 160 may assign a higher priority to the read command comingfrom the queue with fifty I/O commands waiting for transmission.Rebalancer 160 may assign a lower priority to the data wipe commandcoming from a queue with only one I/O command waiting for transmission.

The rebalancer may then dispatch the first I/O command to a firstphysical queue and the second I/O command to a second physical queue(block 319). For example, the first I/O command, with an updatedpriority, may be dispatched to a first physical queue 240A and thesecond I/O command, with an updated priority, may be dispatched to asecond physical queue 240B. Each physical queue 240A-B may have a weightassigned to the physical queue 240A-B, and may receive a proportion ofI/O commands based on the weight of the queue 240A-B. A queue 240A thathas a higher assigned weight may receive more I/O commands than a queue240B that has a lower assigned weight. In another example, the I/Ocommands may be dispatched to a physical queue 240A based on a capacityto handle I/O commands of the physical queue or based on an availabilitycapacity. For example, a first physical queue 240A may have a highercapacity (e.g., more space, access to faster processors, etc.) to handleI/O commands than a second physical queue 240B, and may receive more I/Ocommands. In another example, a first physical queue 240A may have moreopen or available space than a second physical queue 240B and mayreceive more I/O commands due to the open or available space. In anotherexample, I/O commands may be dispatched evenly to all physical queues240A-B in a system.

Alternatively, the I/O commands may be sent to the same physical queue240A. In such an example, there may be only one physical queue, and allI/O commands from rebalancer 160 may be sent to that physical queue.

The I/O commands with updated priorities may then be dispatched from thephysical queue 240A-B to a WRR 220C. The WRR 220C may reorder the I/Ocommands based on the updated priorities of the I/O commands. The WRR220C may dispatch the I/O commands to a processor or CPU 120A.

FIGS. 4a and 4b are block diagrams of I/O command rebalancing accordingto an example of the present disclosure. FIG. 4a depicts an exampleprocess 400 showing a first WRR 220A receiving I/O commands fromcontainers 210A & 210B and a second WRR 220B receiving I/O commands froma third container 210C. For example, containers 210A & 210B may be on afirst virtual machine 170A and container 210 may be on a second virtualmachine 170B. Container 210A may be a high priority container, such thatthe container priority is high. In an enterprise system, a user may havepaid for higher priority. Container 210B & 210C may be standard prioritycontainers, such that the container priority is standard. Each I/Ocommand may have a priority assigned to the command regardless ofwhether the I/O command is from a standard priority container or a highpriority container.

WRR 220A may receive I/O commands from a first container 210A and asecond container 210B (block 439). WRR 220B may receive I/O commandsfrom a third container 210C (block 440). The I/O commands may come fromvirtual queues on the containers 210A-C. WRR 220A may receive a firstI/O command 410 having a low priority from a first container 210A, thefirst container 210A may have a high priority. The first I/O command 410may be a write command. The first container 210A may belong to a userwho paid for a higher processing priority. WRR 220A may receive a secondI/O command 412 having a high priority from a second container 210B, thesecond container 210B may have a standard priority. The second I/Ocommand 412 may be a wipe command and the second container 210B maybelong to a standard user. WRR 220B may receive a third I/O command 414having a low priority from a third container 210C, the third container210C may have a standard priority. The third I/O command 414 may be aread command and the third container 210C may belong to a standard user.

WRR 220A may receive a fourth I/O command 416, a read command, having alow priority from the first container 210A. WRR 220A may receive a fifthI/O command 418, a write command, having a low priority from the secondcontainer. WRR 220B may receive a sixth I/O command 420, a writecommand, having a low priority from the third container 210C. WRR 220Amay receive a seventh I/O command 422, a read command, having a lowpriority from the first container 210A. WRR 220A may receive an eighthI/O command 424, a data wipe command, having a high priority from thesecond container 210B. WRR 220B may receive a ninth I/O command 426, adata wipe command, having a high priority from the third container 210C.WRR 220A may receive a tenth I/O command 428, a data wipe command,having a high priority from the first container 210A. WRR 220A mayreceive an eleventh I/O command 430, a data wipe command, having a highpriority from the second container 210B. WRR 220B may receive a twelfthI/O command 432, a write command, having a low priority from the thirdcontainer 210C. WRR 220A may receive a thirteenth I/O command 434, adata wipe command, having a high priority from the first container 210A.WRR 220A may receive a fourteenth I/O command 436, a write command,having a low priority from the second container 210B. WRR 220B mayreceive a fifteenth I/O command 438, a data wipe command, having a highpriority from the third container 210C.

FIG. 4b depicts an example process 450 showing rebalancer 160 receivingI/O commands, updating command priorities and dispatching I/O commandsfor processing. WRR 220A may send the received I/O commands torebalancer 160 (block 451). As part of sending the I/O commands, WRR220A may reorder the I/O commands based on the priority of the I/Ocommands. WRR 220A may send I/O commands such that three out of everyfive I/O commands are high priority commands, this may be known as a 3/5requirement. For example, WRR 220A may send rebalancer 160: the secondI/O command 412, followed by the eighth I/O command 424, followed by thetenth I/O command 428, followed by the first I/O command 410, followedby the fourth I/O command 416, followed by the eleventh I/O command 430,followed by the thirteenth I/O command 434, followed by the fifth I/Ocommand 418, followed by the seventh I/O command 422, followed by thefourteenth I/O command 436. In this example, the order of I/O commandsis determined by the 3/5 requirement of WRR 220A. WRR 220A may useadditional requirements which would accordingly change the order of I/Ocommands sent from WRR 220A to rebalancer 160.

WRR 220B may send the received I/O commands to rebalancer 160 (block452). As part of sending the I/O commands, WRR 220B may reorder the I/Ocommands based on the priority of the I/O commands. WRR 220B may sendI/O commands such that three out of every five I/O commands are highpriority commands. For example, WRR 220B may send rebalancer 160: theninth I/O command 426, followed by the fifteenth I/O command 438,followed by the third I/O command 414, followed by the sixth I/O command420, followed by the twelfth I/O command 432. In this example, the orderof I/O commands is determined by the 3/5 requirement of WRR 220B. WRR220B may use additional requirements which would accordingly change theorder of the I/O commands sent from WRR 220B to rebalancer 160.

Rebalancer 160 receives the I/O commands (block 455). Rebalancer 160 mayreceive the I/O commands in the order the WRR 220A-B sends them. The WRR220A-B may send the I/O commands one at a time. The WRR 220A-B may sendthe I/O commands in groups or batches including multiple I/O commands.

Rebalancer 160 may detect a container priority of the first container210A, a container priority of the second container 210B, and a containerpriority of the third container 210C. Rebalancer 160 may detect aquantity of virtual queues 199A-B & 211 of each virtual machine wherethe I/O commands are originating. Rebalancer 160 may also detect aquantity of virtual queues 199A-B & 211 of each container 210A-C in thevirtual machines where the I/O commands are originating. Rebalancer 160may detect a priority associated with each I/O command. The commandpriority may be found in an I/O command header. Rebalancer 160 may alsodetect an application priority of an application from which the I/Ocommand originates. A high application priority may be associated with areal-time application such as a video-streaming application, a databaseoperation application, or a disk maintenance application. A lowapplication priority may be associated with an application such as aword processing application, a data backup application, or a fileindexing application.

Rebalancer 160 assigns an updated priority to each I/O command (block460). The updated priority may be based on a quantity of virtual queues199A-B & 211 in the virtual machine 170A-B of origin and/or the quantityof I/O commands in those virtual queues 199A-B & 211. Further, theupdated priority may be based on the detected values detected byrebalancer 160.

For example, the updated priorities may rank, and accordingly assign anupdated priority to each I/O command, the high priority I/O commandsfrom the high priority container 210A above high priority I/O commandsfrom a standard priority container 210B-C, which may be ranked higherthan low priority commands from a high priority container 210A, whichmay be ranked higher than low priority commands from a standard prioritycontainer 210B-C. After rebalancer 160 assigns an updated priority, thetenth and thirteenth I/O commands 428 & 434 may be ranked as a highestpriority, as high priority I/O commands from a high priority container210A. The second, eighth, ninth, eleventh, and fifteenth I/O commands412, 424, 426, 430, & 438 may be ranked as a second highest priority, ashigh priority commands from a standard priority container 210B-C. Thefirst, fourth, and seventh I/O commands 410, 416, & 422 may be ranked asa third highest priority, as low priority commands from a high prioritycontainer 210A. The third, fifth, sixth, twelfth and fourteenth I/Ocommands 414, 418, 420, 432, & 436 may be ranked as a fourth highestpriority, as low priority commands from a standard priority container210B-C.

The I/O commands with updated priorities are dispatched to a physicalsubmission queue (block 465). The I/O commands may be sent in the sameorder in which rebalancer 160 received the I/O commands.

Physical queues 240A-B queue the I/O commands with updated prioritiesfor processing (block 470).

Physical queues 240A-B transmit the I/O commands with updated prioritiesfor processing (block 475). The physical queues 240A-B may dispatch orsend the I/O commands to a WRR 220C for reordering. The WRR 220C mayensure that the updated priorities are accounted for as the I/O commandsare processed. WRR 220C may ensure that the I/O commands are processedby CPU 120A according to the updated priorities of the I/O commands. Forexample, WRR 220C may ensure all of the highest priority I/O commands428 & 434 are processed first, all of the second highest priority I/Ocommands 412, 424, 426, 430, & 438 are processed second, all of thethird highest priority I/O commands 410, 416, & 422 are processed third,and all of the fourth highest priority I/O commands 414, 418, 420, 432,& 436 are processed fourth.

In another example, WRR 220C may ensure that four out of every eightcommands are of the highest priority, two out of every eight commandsare of a second highest priority, and one out of every eight commands isof a third highest priority and one out of every eight commands is ofthe remaining priorities. If there are no commands of a given priority,WRR 220C may use the next highest available priority to fulfill thehigher priority ratio. In such an example, WRR 220C may send the highestpriority I/O commands 428 and 434, followed by four of the secondhighest priority I/O commands 412, 424, 426, 438, followed by one of thethird highest priority commands 410, followed by one of the remainingpriority commands, 414.

Without rebalancer 160 to assign updated priorities, a high prioritycontainer 210A may not receive the resources desired and/or expected.For example, without rebalancer 160, and without any WRR 220A-C, I/Ocommands may be processed in a first-in-first-out (FIFO) scheme.Accordingly, the order of I/O commands in a FIFO scheme would be thefirst command 410, followed by a second I/O command 412, followed by athird I/O command 414, followed by a fourth I/O command 416, followed bya fifth I/O command 418, followed by a sixth I/O command 420, followedby a seventh I/O command 422, followed by an eighth I/O command 424,followed by a ninth I/O command 426, followed by a tenth I/O command428, followed by an eleventh I/O command 430, followed by a twelfth I/Ocommand 432, followed by a thirteenth I/O command 434, followed by afourteenth I/O command 436, followed by a fifteenth I/O command 438.

In another example, without rebalancer 160 and with WRR 220A-C, only theI/O command priorities would be used by WRR 220A-C and the I/O commandswould be based on the command priorities. WRR 220A would send to thephysical queue 240A, the second I/O command 412, followed by the eightI/O command 424, followed by the tenth I/O command 428, followed by thefirst I/O command 410, followed by the fourth I/O command 416, followedby the eleventh I/O command 430, followed by the thirteenth I/O command434, followed by the fifth I/O command 418, followed by the seventh I/Ocommand 422, followed by the fourteenth I/O command 436. WRR 220B maysend, to the physical queue 240B, the ninth I/O command 426, followed bythe fifteenth I/O command 438, followed by the third I/O command 414,followed by the sixth I/O command 420, followed by the twelfth I/Ocommand 432.

WRR 220C, using a 3/5 requirement, may then send to the processor 120Athe second I/O command 412, the ninth I/O command 426, the eighth I/Ocommand 424, the fifteenth I/O command 438, the tenth I/O command 428,the thirteenth I/O command 434, the eleventh I/O command 430, the thirdI/O command 414, the first I/O command 410, the sixth I/O command 420,the fourth I/O command 416, the twelfth I/O command 432, the seventh I/Ocommand 422, the fifth I/O command 418 and the fourteenth I/O command436. The weighted round robin may not be able to detect a quantity ofvirtual queues present in a virtual machine of origin, a quantity of I/Ocommands in each queue, a container priority or an application priority.Rebalancer 160 detects these priorities, and reassigns an updatedpriority to each I/O command.

Without rebalancer 160 to reassign a priority to each I/O command,command priorities and container priorities may not be consistentlyhonored by the I/O command processing system. Rebalancer 160advantageously provides full enforcement of priorities, over a scheduleror a WRR.

Even when the original priorities in two different virtual machines 170A& 170B are not compatible, a rebalancer may ensure that I/O commands areprocessed according to their respective priorities. For example, virtualmachine 170A may include I/O command priorities of gold, silver, andbronze, while virtual machine 170B may include I/O command priorities ofA and B. In such a case, rebalancer may assign updated priorities toeach I/O command by accounting for a quantity of virtual queues presentin a virtual machine of origin, a quantity of I/O commands in eachqueue, a container priority, an application priority and a commandpriority.

In another example, a first virtual machine may include three virtualqueues. A first virtual queue in a first application may have fifteenI/O commands, a second virtual queue in a second application may havefifty I/O commands, and a third virtual queue in a third application mayhave two I/O commands. A second virtual machine may have fourth virtualqueue in a fourth application with five I/O commands.

The second application may have a high application priority and each ofthe first, third, and fourth applications may have the same standardapplication priority; the I/O commands from the first virtual queue mayhave a high priority; and the I/O commands from the second, third, andfourth virtual queues may have a standard priority.

The first virtual machine may have a first WRR to send I/O commands to arebalancer and the second virtual machine may have a second WRR to sendI/O commands to the rebalancer. The rebalancer may receive commands fromthe first WRR such that it receives the high priority commands from thefirst virtual queue before the standard priority commands from thesecond and third virtual queues. The rebalancer may receive commandsfrom the second WRR in the order the commands are issued as the secondWRR only receives standard priority I/O commands from the fourth virtualqueue.

The rebalancer then may determine the number of virtual queues in eachvirtual machine (e.g., three virtual queues in the first virtual machineand one virtual queue in the second virtual machine) and the quantity ofI/O commands in each virtual queue (e.g., fifteen in the first virtualqueue, fifty in the second virtual queue, two in the third virtual queueand five in the fourth virtual queue).

The rebalancer then assigns updated priorities to each I/O command andsends the I/O commands to a physical queue, where a third WRR mayreorder the I/O commands based on their updated priorities. Therebalancer may assign priorities such that the standard priority I/Ocommands from standard application priority virtual queues with two I/Ocommands have a lower priority than standard priority commands from highapplication priority virtual queues with fifty I/O commands. Therebalancer may assign a numerical value to a command priority and adifferent numerical value to an application priority. The rebalancer maythen multiply the numerical command priority by the numericalapplication priority and further by the number of I/O commands in thevirtual queue of origin to obtain an updated priority. The rebalancermay further multiply by the number of virtual queues in the virtualmachine to obtain the updated priority.

In doing so, the rebalancer may ensure that standard priority commandsfrom standard priority application queues are not processed sooner basedsolely on the fact that there are less commands in that queue, insteadthe rebalancer ensures that I/O commands are processed according to aquantity of virtual queues present in a virtual machine of origin, aquantity of I/O commands in each queue, a container priority, anapplication priority and a command priority.

Without the rebalancer, for example, the I/O commands would have beensent to the physical queues from a third WRR such that the fifteen highpriority commands from the first virtual queue were queued in thephysical queue before other commands. The remaining commands may bequeued irrespective of any application priority, causing commands fromthe fourth virtual queue with a standard application priority to beprocessed at a higher frequency than the commands from the secondvirtual queue with high application priority. Advantageously, therebalancer may assign an updated priority to the I/O commands ensuringthat priorities are honored by the system.

FIG. 5 is a block diagram of a submission queue rebalancing systemaccording to an example of the present disclosure. System 500 includes amemory 530 coupled to a processor 520 and a rebalancer 560. Rebalancer560 receives I/O command 501 from virtual queue 551. Virtual queue 551is located on container 581, which is located within virtual machine570A. Rebalancer 560 also receives I/O command 502 from virtual queue552. Virtual queue 552 is located on container 582, which is locatedwithin virtual machine 570B. Virtual machine 570A may include a secondcontainer 583 with virtual queue 553. Rebalancer 560 detects a priority511 of I/O command 501 and a priority 512 of I/O command 502. Rebalancer560 assigns an updated priority 511A to I/O command 501 and an updatedpriority 512A to I/O command 502. The updated priority 511A is based ona quantity of virtual queues 551 & 553 in virtual machine 570A and aquantity of I/O commands in queues 551 & 553. The updated priority 512Ais based on the quantity of virtual queues 552 in virtual machine 570Band a quantity of I/O commands in queue 552. Rebalancer 560 dispatchesI/O command 501 to physical queue 540A and I/O command 502 to physicalqueue 540B.

It should be understood that various changes and modifications to theexamples described herein will be apparent to those skilled in the art.Such changes and modifications can be made without departing from thespirit and scope of the present subject matter and without diminishingits intended advantages. It is therefore intended that such changes andmodifications be covered by the appended claims.

It will be appreciated that all of the disclosed methods and proceduresdescribed herein can be implemented using one or more computer programsor components. These components may be provided as a series of computerinstructions on any conventional computer readable medium or machinereadable medium, including volatile or non-volatile memory, such as RAM,ROM, flash memory, magnetic or optical disks, optical memory, or otherstorage media. The instructions may be provided as software or firmware,and/or may be implemented in whole or in part in hardware componentssuch as ASICs, FPGAs, or any other similar devices. The instructions maybe configured to be executed by one or more processors, which whenexecuting the series of computer instructions, performs or facilitatesthe performance of all or part of the disclosed methods and procedures.

The invention claimed is:
 1. A method of queuing I/O commands,comprising: receiving a first I/O command having a first priorityselected from the group consisting of a first application priority, afirst command priority, and combinations thereof, from a first virtualqueue in at least one of a first container and a first application;receiving a second I/O command having a second priority selected fromthe group consisting of a second application priority and a secondcommand priority, and combinations thereof, from a second virtual queuein at least one of a second container and a second application;detecting the first priority of the first I/O command and the secondpriority of the second I/O command; assigning, at a rebalancer, a firstupdated priority to the first I/O command and a second updated priorityto the second I/O command, wherein the rebalancer assigns the firstupdated priority based on a first quantity of virtual queues and a firstquantity of I/O commands in the first virtual queue, and the rebalancerassigns the second updated priority based on a second quantity ofvirtual queues and a second quantity of I/O commands in the secondvirtual queue; and dispatching the first I/O command to a first physicalqueue and the second I/O command to a second physical queue.
 2. Themethod of claim 1, further comprising: dispatching the first I/O commandfrom the first physical queue to a weighted round robin; and dispatchingthe second I/O command from the second physical queue to the weightedround robin.
 3. The method of claim 2, wherein the weighted round robinreorders the first I/O command and the second I/O command based on thefirst updated priority and second updated priority and dispatches thefirst I/O command and the second I/O command to a processor.
 4. Themethod of claim 1, wherein the at least one of a first container and afirst application is in a first virtual machine, and wherein the atleast one of a second container and a second application is in a secondvirtual machine.
 5. The method of claim 4, wherein the first I/O commandis dispatched to the first physical queue based on capacity in the firstphysical queue and the second I/O command is dispatched to the secondphysical queue based on capacity in the second physical queue.
 6. Themethod of claim 1, wherein the first updated priority is further basedon a first container priority and the second updated priority is furtherbased on a second container priority.
 7. The method of claim 6, whereinthe container priority is based on at least one of a user status, anapplication status, and an operation status.
 8. The method of claim 1,wherein the first priority is detected from a command field of the firstI/O command.
 9. A system for queuing I/O commands comprising: aprocessor; a memory; and a rebalancer, wherein the rebalancer: receivesa first I/O command having a first priority selected from the groupconsisting of a first application priority, a first command priority,and combinations thereof, from a first virtual queue in at least one ofa first container and a first application; receives a second I/O commandhaving a second priority selected from the group consisting of a secondapplication priority and a second command priority, and combinationsthereof, from a second virtual queue in one of a second container and asecond application; detects the first priority of the first I/O commandand the second priority of the second I/O command; assigns a firstupdated priority to the first I/O command and a second updated priorityto the second I/O command, wherein the first updated priority is basedon a first quantity of virtual queues and a first quantity of I/Ocommands in the first virtual queue, and the second updated priority isbased on a second quantity of virtual queues and a second quantity ofI/O commands in the second virtual queue; and dispatches the first I/Ocommand to a first physical queue and the second I/O command to a secondphysical queue.
 10. The system of claim 9, further comprising a weightedround robin that reorders the first I/O command and the second I/Ocommand based on the first updated priority and second updated priorityand dispatches the first I/O command and the second I/O command from thefirst physical queue and the second physical queue to the processor. 11.The system of claim 9, wherein the at least one of a first container anda first application is in a first virtual machine, and wherein the atleast one of a second container and a second application is in a secondvirtual machine.
 12. The system of claim 9, wherein the first physicalqueue is different from the second physical queue.
 13. The method ofclaim 12, wherein the first I/O command is dispatched to a firstphysical queue based on capacity in the first physical queue and thesecond I/O command is dispatched to a second physical queue based oncapacity in the second physical queue.
 14. The system of claim 9,wherein the updated priorities are further based on a containerpriority.
 15. The system of claim 14, wherein the container priority isbased on at least one of a user status, an application status and anoperation status.
 16. The system of claim 9, further comprising anorchestrator that receives a command to create the first container andcreates the first container.
 17. The system of claim 16, wherein theorchestrator assigns the first priority to the first I/O command. 18.The system of claim 9, wherein the priority of the first I/O command isdetected from a command field of the first I/O command.
 19. The systemof claim 9, wherein the rebalancer is a non-volatile memory expressstorage controller.
 20. A non-transitory computer readable mediumstoring instructions, which when executed, cause a rebalancer to:receive a first I/O command from a first virtual queue in at least oneof a first container and a first application; receive a second I/Ocommand from a second virtual queue in at least one of a secondcontainer and second application; detect a first priority of the firstI/O command and a second priority of the second I/O command; assign afirst updated priority to the first I/O command and a second updatedpriority to the second I/O command, wherein the first updated priorityis based on a first quantity of virtual queues and a first quantity ofI/O commands in the first virtual queue, and wherein the second updatedpriority is based on a second quantity of virtual queues and a secondquantity of I/O commands in the second virtual queue; and dispatch thefirst I/O command to a first physical queue and the second I/O commandto a second physical queue.